A comparison of acoustic coding models for speech-driven facial animation

نویسندگان

  • Praveen K. Kakumanu
  • Anna Esposito
  • Oscar N. Garcia
  • Ricardo Gutierrez-Osuna
چکیده

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher s Linear Discriminants to project acoustic information onto a lowdimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio–visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Recognition with Hidden Markov Models in Visual Communication

Speech is produced by the vibration of the vocal cords and the configuration of the arti-culators. Because some of these articulators are visible, there is an inherent relationship between the acoustic and the visual forms of speech. This relationship has been historically used in lipreading. Today's advanced computer technology opens up new possibilities to exploit the correlation between acou...

متن کامل

Natural head motion synthesis driven by acoustic prosodic features

Natural head motion is important to realistic facial animation and engaging human-computer interactions. In this paper, we present a novel data-driven approach to synthesize appropriate head motion by sampling from trained Hidden Markov Models (HMMs). First, while an actress recited a corpus specifically designed to elicit various emotions, her 3D head motion was captured and further processed ...

متن کامل

Speech and Expression Driven Animation of a Video-Realistic Appearance Based Hierarchical Facial Model

We describe a new facial animation system based on a hierarchy of morphable sub-facial appearance models. The innovation in our approach is that through the hierarchical model, parametric control is available for the animation of multiple sub-facial areas. We animate these areas automatically both from speech to produce lip-synching, and natural pauses and hesitations and using specific tempora...

متن کامل

Carnival-combining speech technology and computer animation.

Speech is powerful information technology and the basis of human interaction. By emitting streams of buzzing, popping, and hissing noises from our mouths, we transmit thoughts, intentions, and knowledge of the world from one mind to another. We’re accustomed to thinking of speech as an acoustic, auditory phenomenon. However, speech is also visible. Although the primary function of speech is to ...

متن کامل

Speech-driven 3d facial animation for mobile entertainment

This paper presents an entertainment-oriented application for mobile service, which generates customized speech-driven 3D facial animation and delivers to end-user by MMS (Multimedia Messaging Service). Some important methods of this application are discussed, including the 3D facial model based on 3 photos, the 3D facial animation driven by speech or text on-line and the video format transform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2006